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2 5 BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention relates to a method for 
analyzing fundamental frequency information contained 

in voice samples, and a voice conversion method and 

3 0 system implementing said analysis method. 

Description of Related Art 

Depending on the nature of the sounds to be 
produced, production of speech, and in particular 

35 voiced sounds, may entail vibration of the vocal 
chords, which manifests itself through the presence in 
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the speech signal of a periodic structure having a 
fundamental period, the inverse of which is referred to 
as the fundamental frequency or pitch. 

In certain applications, such as voice conversion, 
5 aural rendering is of vital importance, and effective 
control of the parameters linked to prosody, including 
the fundamental frequency, is required in order to 
obtain acceptable quality. 

Thus, numerous methods currently exist for 
10 analyzing the fundamental frequency information 
contained in voice samples . 

These analyses enable the determination and 
modeling of fundamental frequency characteristics. For 
example, methods exist which enable determination of 
15 the slope or an amplitude scale of the fundamental 
frequency over an entire database of voice samples . 

Knowledge of these parameters enables 
modifications of speech signals to be made, for example 
by fundamental frequency scaling between source and 
20 target speakers, in such a way as to globally respect 
the mean and the variation of the fundamental frequency 
of the target speaker. 

However, these analyses enable only general 
representations to be obtained, and not fundamental 
25 frequency representations whose parameters can be 
defined, and are therefore not relevant, in particular 
to speakers whose speaking styles differ. 

The object of the present invention is to overcome 
this problem by defining a method for analyzing 
30 fundamental frequency information of voice samples, 
making it possible to define a fundamental frequency 
representation whose parameters can be defined. 

BRIEF SUMMARY OF THE INVENTION 
35 For this purpose, the subject of the present 

invention is a method for analyzing fundamental 

frequency information contained in voice samples, 
characterized in that it comprises at least: 
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- a step for the analysis of the voice samples 
grouped together in frames in order to obtain, for each 
sample frame, spectrum-related information and 
information relating to the fundamental frequency; 

5 - a step for the determination of a model 

representing the common characteristics of the spectrum 
and fundamental frequency of all samples; and 

- a step for the determination of a fundamental 
frequency prediction function exclusively according to 

10 spectrum-related information on the basis of said model 
and voice samples. 

According to other characteristics of this 
analysis method: 

- said analysis step is adapted to supply said 
15 spectrum-related information in the form of cepstral 

coefficients ; 

- said analysis step comprises: 

a sub-step for modeling voice samples 
according to a sum of a harmonic signal and a noise 
20 signal; 

a sub-step for estimating frequency 
parameters, and at least the fundamental frequency of 
the voice samples; 

- a sub-step for synchronized analysis of the 
25 fundamental frequency of each sample frame; and 

- a sub-step for estimating the spectral 
parameters of each sample frame; 

it furthermore comprises a step for normalizing 
the fundamental frequency of each sample frame in 
30 relation to the mean of the fundamental frequencies of 
the analyzed samples; 

said step for the determination of a model 
corresponds to the determination of a model by mixing 
Gaussian densities; 
35 - said model determination step comprises : 

a sub-step for determining a model 
corresponding to a mixture of Gaussian densities; and 
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- a sub-step for estimating the parameters of 
the mixture of Gaussian densities on the basis of the 
estimation of the maximum resemblance between the 
spectral information and the fundamental frequency 
5 information of the samples and of the model; 

- said step for the determination of a prediction 
function is implemented on the basis of an estimator of 
the implementation of the fundamental frequency, 
knowing the spectral information of the samples; 

10 - said step for determining the fundamental 

frequency prediction function comprises a sub-step for 
determining the conditional expectation of the 
implementation of the fundamental frequency, knowing 
the spectral information, on the basis of the 

15 a posteriori probability that the spectral information 
is obtained on the basis of the model, the conditional 
expectation forming said estimator. 

The invention also relates to a method for the 
conversion of a voice signal pronounced by a source 

20 speaker into a converted voice signal whose 
characteristics resemble those of a target speaker, 
comprising at least: 

a step for determining a function for the 
transformation of spectral characteristics of the 

25 source speaker into spectral characteristics of the 
target speaker, implemented on the basis of voice 
samples of the source speaker and the target speaker; 
and 

- a step for transforming spectral information of 
30 the voice signal of the source speaker to be converted 

with the aid of said transformation function, 

characterized in that it furthermore comprises: 

- a step for determining a fundamental frequency 
prediction function exclusively according to spectrum- 

35 related information for the target speaker, said 
prediction function being obtained with the aid of an 
analysis method as defined above; and 
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- a step for predicting the fundamental frequency 
of the voice signal to be converted by applying said 
fundamental frequency prediction function to said 
transformed spectral information of the voice signal of 

5 the source speaker . 

According to other characteristics of this 
conversion method: 

said step for determining a transformation 
function is implemented on the basis of an estimator of 
10 the implementation of the target spectral 
characteristics, knowing the source spectral 
characteristics ; 

said step for determining a transformation 
function comprises: 
15 - a sub-step for modeling the source and target 

voice samples according to a sum model of a harmonic 
signal and a noise signal; 

- a sub-step for aligning the source and target 
samples; and 

20 - a sub-step for determining said 

transformation function on the basis of the calculation 
of the conditional expectation of the implementation of 
the target spectral characteristics, knowing the 
implementation of the source spectral 

25 characterizations, the conditional expectation forming 
said estimator. 

said transformation function is a spectral 
envelope transformation function; 

- it furthermore comprises a step for analyzing 
30 the voice signal to be converted, adapted to supply 

said spectrum-related information and information 
relating to the fundamental frequency; 

it furthermore comprises a synthesis step, 
enabling the formation of a converted voice signal on 
35 the basis of at least the transformed spectral 
information and the predicted fundamental frequency 
information . 
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The invention also relates to a system for 
converting a voice signal pronounced by a source 
speaker into a converted voice signal whose 
characteristics resemble those of a target speaker, 
5 said system comprising at least: 

means for determining a function for 
transforming spectral characteristics of the source 
speaker into spectral characteristics of the target 
speaker, receiving, at their input, voice samples of 
10 the source speaker and of the target speaker; and 

- means for transforming spectral information of 
the voice signal of the source speaker to be converted 
by applying said transformation function supplied by 
the means, 

15 characterized in that it furthermore comprises: 

- means for determining a fundamental frequency 
prediction function exclusively according to spectrum- 
related information for the target speaker, adapted for 
the implementation of an analysis method, on the basis 

2 0 of voice samples of the target speaker; and 

- means for predicting the fundamental frequency 
of said voice signal to be converted by applying said 
prediction function determined by said means for 
determining a prediction function to said transformed 

25 spectral information supplied by said transformation 
means . 

According to other characteristics of this system: 

- it furthermore comprises: 

- means for analyzing the voice signal to be 
30 converted, adapted to supply, at their output, 

spectrum-related information and information relating 
to the fundamental frequency of the voice signal to be 
converted; and 

- synthesis means enabling the formation of a 
35 converted voice signal on the basis of at least the 

transformed spectral information supplied by the means 
and the predicted fundamental frequency information 
supplied by the means; 
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- said means for determining a transformation 
function are adapted to supply a spectral envelope 
transformation function; 

- it is adapted for the implementation of a voice 
5 conversion method as defined above. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS 

The invention will be more readily understood from 
a reading of the description which follows, provided 
10 purely as an example and with reference to the attached 
drawings, in which: 

- Fig. 1 is a flowchart of an analysis method 
according to the invention; 

- Fig. 2 is a flowchart of a voice conversion 
15 method implementing the analysis method according to 

the invention; and 

- Fig. 3 is a functional block diagram of a voice 
conversion system, enabling the implementation of the 
method according to the invention described in figure 

20 2. 

DETAILED DESCRIPTION OF THE INVENTION 

The method according to the invention shown in 
figure 1 is implemented on the basis of a database of 

25 voice samples containing sequences of natural speech. 

The method starts with a step 2 for analyzing 
samples by grouping them together in frames, in order 
to obtain, for each sample frame, spectrum-related 
information and, in particular, information relating to 

30 the spectral envelope, and information relating to the 
fundamental frequency. 

In the embodiment described, this analysis step 2 
is based on the use of a model of a sound signal in the 
form of a sum of a harmonic signal and a noise signal 

35 according to a model normally referred to as "HNM" 
(Harmonic plus Noise Model) . 
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Moreover, the embodiment described is based on a 
representation of the spectral envelope by the discrete 
cepstrum, 

A cepstral representation in fact enables 

5 separation, in the speech signal, of the component 
relating to the vocal tract from the resulting source 
component, corresponding to the vibrations of the vocal 
chords and characterized by the fundamental frequency. 

Thus, analysis step 2 comprises a sub-step 4 for 

10 modeling each voice signal frame into a harmonic part 
representing the periodic component of the signal, 
consisting of a sum of L harmonic sinusoids with 
amplitude A| and phase (|)|, and a noisy part representing 
the friction noise and glottal excitation variation. 

15 This can therefore be formulated as follows: 

s(n)=h(n)+b(n) 

L 

where h ( n ) = ^Ai(n)cos((|)i(n)) 

1=1 

20 The term h(n) therefore represents the harmonic 

approximation of the signal s(n). 

Step 2 then comprises a sub-step 5 for estimating, 
for each frame, frequency parameters, of the 
fundamental frequency in particular, for example by 

25 means of an autocorrelation method. 

In a conventional manner, this HNM analysis 
supplies the maximum voicing frequency. As a variant, 
this frequency may be arbitrarily defined, or may be 
estimated by other known means. 

30 This sub-step 5 is followed by a sub-step 6 for 

synchronized analysis of the fundamental frequency of 
each frame, enabling estimation of the parameters of 
the harmonic part and the parameters of the signal 
noise . 

35 In the embodiment described, this synchronized 

analysis corresponds to the determination of the 
harmonic parameters through minimization of a weighted 
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least squares criterion between the full signal and its 
harmonic breakdown, corresponding, in the embodiment 
described, to the estimated noise signal. The criterion 
denoted as E is equal to: 

5 

Ti 

E = £w2(n)(s(n)-h(n))2 
n=-Ti 



In this equation, w(n) is the analysis window and 
Ti is the fundamental period of the current frame. 
10 Thus, the analysis window is centered around the 

fundamental period marker and its duration is twice 
this period. 

The analysis step 2 lastly comprises a sub-step 7 
for estimating the parameters of the components of the 

15 spectral envelope of the signal, using, for example, a 
regularized discrete cepstrum method and a Bark-scale 
transformation in order to reproduce the properties of 
the human ear as faithfully as possible. 

Thus, the analysis step 2 supplies, for each frame 

20 of order n of speech signal samples, a scalar denoted 
as Xn, comprising fundamental frequency information, and 
a vector denoted as yn, comprising spectral information 
in the form of a sequence of cepstral coefficients . 

Advantageously, the analysis step 2 is followed by 

25 a step 10 for normalizing the value of the fundamental 
frequency of each frame in relation to the mean 
fundamental frequency in order to replace, in each 
voice sample frame, the value of the fundamental 
frequency with a fundamental frequency value normalized 

30 according to the following formula: 

f \ 

Flog = log 

pmoy 

I o ) 
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In this formula, F corresponds to the mean of 

the values of the fundamental frequencies over the 
entire analyzed database. 

This normalization enables modification of the 
5 scale of the variations of the fundamental frequency 
scalars in order to make it consistent with the scale 
of the cepstral coefficient variations . 

The normalization step 10 is followed by a step 20 
for determining a model representing the common 
10 cepstrum and fundamental frequency characteristics of 
all the analyzed samples. 

The embodiment described involves a probabilistic 
model of the fundamental frequency and of the discrete 
cepstrum according to a Gaussian densities mixture 
15 model, generally referred to as "GMM", the parameters 
of which are estimated on the basis of the joint 
density of the normalized fundamental frequency and the 
discrete cepstrum. 

In a conventional manner, the probability density 
20 of a random variable denoted in a general manner as 
p(z), according to a Gausssian densities mixture model 
GMM, is denoted mathematically in the following manner: 

Q 

p(z)=2]oCi=N(z,|j,i,i:i) 

i=l 

Q 

25 where ^(Xi,= l, o<ai<l 

i=l 

In this formula, N(z: [Xi; Ei) is the probability 
density of the normal law of mean [li and the covariance 
matrix Ei and the coefficients tti are the coefficients 
30 of the mixture. 

Thus, the coefficient tti corresponds to the 
a priori probability that the random variable z is 
generated by the i^*^ Gaussian of the mixture. 

In a more particular manner, the step 20 for 
35 determining the model comprises a sub-step 22 for 
modeling the joint density of the cepstrum denoted as y 
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and the normalized fundamental frequency denoted as x, 
in such a way that : 



In these equations, x = [xi, X2,... x^] corresponds 
to the sequence of the scalars containing the 
normalized fundamental frequency information for N 
voice sample frames and y = [yi, Y21- Yn] corresponds to 
10 the sequence of the corresponding cepstrum coefficient 
vectors . 

The step 20 then comprises a sub-step 24 for 
estimating GMM parameters (a, [I, E) of the density 
p(z). This estimation may be implemented, for example, 

15 with the aid of a conventional algorithm of the type 
known as "EM" (Expectation Maximization) , corresponding 
to an iterative method by means of which an estimator 
of the maximum resemblance between the speech sample 
data and the Gaussian mixture model is obtained. 

20 The determination of the initial parameters of the 

GMM model is obtained with the aid of a conventional 
vector quantification technique. 

The model determination step 20 thus supplies the 
parameters of a mixture of Gaussian densities 

25 representing common spectral characteristics, 
represented by the cepstrum coefficients, and 
fundamental frequencies of the analyzed voice samples. 

The method then comprises a step 30 for 
determining, on the basis of the model and voice 

30 samples, a fundamental frequency prediction function 
exclusively according to spectral information supplied 
by the signal cepstrum. 

This prediction function is determined on the 
basis of an estimator of the implementation of the 

35 fundamental frequency, given the cepstrum of the voice 
samples, formed in the embodiment described by the 
conditional expectation. 
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For this purpose, the step 30 comprises a sub-step 
32 for determining the conditional expectation of the 
fundamental frequency, knowing the spectrum-related 
information supplied by the cepstrum. The conditional 
5 expectation is denoted as F (y) and is determined on the 
basis of the following formulae: 

F ( y ) =E [ X I y ] = |;R(y)[^^+EY(2: Y) ' ( Y-|^ ] 



aN(y,^yEY) 

where Pi(y)=-Q 

Xa.N(y,HyEyf) 
j=i ■> ■> 









1 1 
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where 




and jd i= 













In these equations, Pi(y) corresponds to the a 

posteriori probability that the cepstrum vector y is 

15 generated by the i'^''' component of the Gaussian mixture 
of the model, defined in step 20 by the covariance 
matrix Ei and the normal law [Xi. 

The determination of the conditional expectation 
thus enables the fundamental frequency prediction 

20 function to be obtained from the cepstrum information. 

As a variant, the estimator implemented in step 30 
may be an a posteriori maximum criterion, referred to 
as "MAP", and corresponding to the implementation of 
the expectation calculation exclusively for the model 

25 best representing the source vector. 

It is clear therefore that the analysis method 
according to the invention enables, on the basis of the 
model and the voice samples, a fundamental frequency 
prediction function to be obtained exclusively 

30 according to spectral information supplied, in the 
embodiment described, by the cepstrum. 

A prediction function of this type then enables 
the fundamental frequency value for a speech signal to 
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be determined exclusively on the basis of spectral 
information of this signal, thereby enabling a relevant 
prediction of the fundamental frequency, in particular 
for sounds which are not in the analyzed voice samples . 
5 With reference to figure 2, the use of an analysis 

method according to the invention will now be described 
within the context of voice conversion. 

Voice conversion consists in modifying the voice 
signal of a reference speaker known as the "source 
10 speaker" in such a way that the signal produced appears 
to have been pronounced by a different speaker referred 
to as the "target speaker" . 

This method is implemented using a database of 
voice samples pronounced by the source speaker and the 
15 target speaker. 

In a conventional manner, a method of this type 
comprises a step 50 for determining a transformation 
function for the spectral characteristics of the voice 
samples of the source speaker to make them resemble the 
20 spectral characteristics of the voice samples of the 
target speaker . 

In the embodiment described, this step 50 is based 
on an HNM analysis which enables the relationships 
between the characteristics of the spectral envelope of 
25 the voice signals of the source and target speakers to 
be determined. 

Source and target voice recordings corresponding 
to the acoustic realization of the same phonetic 
sequence are required for this purpose. 
30 The step 50 comprises a sub-step 52 for modeling 

voice samples according to an HNM sum model of harmonic 
and noise signals. 

The sub-step 52 is followed by a sub-step 54 
enabling alignment of the source and target signals 
35 with the aid, for example, of a conventional alignment 
algorithm known as "DTW" (Dynamic Time Warping) . 

Step 50 then comprises a sub-step 56 for 
determining a model such as a GMM model representing 
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the common characteristics of the voice sample spectra 
of the source and target speakers. 

In the embodiment described, a GMM model is used 
which comprises 64 components and a single vector 
containing the cepstral parameters of the source and 
target, in such a way that a spectral transformation 
function can be defined which corresponds to an 
estimator of the realization of the target spectral 
parameters denoted as t, knowing the source spectral 
parameters denoted as s. 

In the embodiment described, this transformation 
function denoted as F{s) is denoted in the form of a 
conditional expectation obtained by the following 
formula : 

F ( s ) =E [t I s ] =|;P,(s)[|a;+E"f (Ef ) { s-u f) ] 



where 



Pi(s)=- 



s ss 
I I 

. ss 



;=1 J J 



where 



ss St 

i i 

ts tt 
E E 



and n i= 



20 The precise determination of this function is 

obtained through maximization of the resemblance 
between the source and the target parameters, obtained 
by means of an EM algorithm. 

As a variant, the estimator may be formed from an 

25 a posteriori maximum criterion. 

The function thus defined therefore enables 
modification of the spectral envelope of a speech 
signal originating from the source speaker in order to 
make it resemble the spectral envelope of the target 

30 speaker. 
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Prior to this maximization, the parameters of the 
GMM model representing the common spectral 
characteristics of the source and target are 
initialized, for example, with the aid of a vector 
5 quantification algorithm. 

In parallel, the analysis method according to the 
invention is implemented in a step 60 in which only the 
voice samples of the target speaker are analyzed. 

As described with reference to figure 1, the 
10 analysis step 60 according to the invention enables a 
fundamental frequency prediction function to be 
obtained for the target speaker, exclusively on the 
basis of spectral information. 

The conversion method then comprises a step 6b in 
15 which a voice signal to be converted, pronounced by the 
source speaker, is analyzed, said signal to be 
converted being different from the voice signals used 
in steps 50 and 60. 

This analysis step 6b is implemented, for example, 
20 with the aid of a breakdown according to the HNM model, 
enabling the provision of spectral information in the 
form of cepstral coefficients, fundamental frequency 
information and maximum frequency and phase voicing 
information . 

25 This step 65 is followed by a step 70 in which the 

spectral characteristics of the voice signal to be 
converted are transformed by applying the 
transformation function determined in step 50 to the 
cepstral coefficients defined in step 65. 

30 This step 70 in particular modifies the spectral 

envelope of the voice signal to be converted. 

At the end of step 70, each frame of samples of 
the source speaker signal to be converted is thus 
associated with transformed spectral information whose 

35 characteristics are similar to the spectral 
characteristics of the samples of the target speaker. 

The conversion method then comprises a fundamental 
frequency prediction step 80 for the voice samples of 
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the source speaker, by applying the prediction function 
determined using the method according to the invention 
in step 60, exclusively to the transformed spectral 
information associated with the source speaker voice 
5 signal to be converted. 

In fact, as the voice samples of the source 
speaker are associated with transformed spectral 
information whose characteristics are similar to those 
of the target speaker, the prediction function defined 

10 in step 60 enables a relevant prediction of the 
fundamental frequency to be obtained. 

In a conventional manner, the conversion method 
then comprises an output signal synthesis step 90, 
implemented, in the example described, by an HNM 

15 synthesis which directly supplies the voice signal 
converted on the basis of the transformed spectral 
envelope information supplied in step 70, the predicted 
fundamental frequency information produced in step 80 
and the maximum frequency and phase voicing information 

20 supplied by step 65. 

The conversion method implementing the analysis 
method according to the invention thus enables a voice 
conversion to be obtained which implements spectral 
modifications and a fundamental frequency prediction in 

25 such a way as to obtain a high-quality aural rendering. 

In particular, the effectiveness of a method of 
this type can be evaluated on the basis of identical 
voice samples pronounced by the source speaker and the 
target speaker . 

30 The voice signal pronounced by the source speaker 

is converted with the aid of the method as described, 
and the resemblance between the converted signal and 
the signal pronounced by the target speaker is 
evaluated. 

35 For example, this resemblance is calculated in the 

form of a ratio between the acoustic distance 
separating the converted signal from the target signal 
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and the acoustic distance separating the target signal 
from the source signal. 

In calculating the acoustic distance on the basis 
of the cepstral coefficients or the signal amplitude 
5 spectrum obtained with the aid of these cepstral 
coefficients, the ratio obtained for a signal converted 
with the aid of the method according to the invention 
is in the order of 0.3 to 0.5. 

Figure 3 shows a functional block diagram of a 
10 voice conversion system implementing the method 
described with reference to figure 2. 

This system uses at its input a database 100 of 
voice samples pronounced by the source speaker and a 
database 102 containing at least the same voice samples 
15 pronounced by the target speaker. 

These two databases are used by a module 104 which 
determines a function for transforming spectral 
characteristics of the source speaker into spectral 
characteristics of the target speaker. 
20 This module 104 is adapted for the implementation 

of step 50 of the method as described with reference to 
figure 2, and therefore enables the determination of a 
spectral envelope transformation function. 

Furthermore, the system comprises a module 106 for 
25 determining a fundamental frequency prediction function 
exclusively according to spectrum-related information. 
To do this, the module 106 receives at its input voice 
samples of the target speaker only, contained in the 
database 102. 

30 The module 106 is adapted for the implementation 

of step 60 of the method described with reference to 
figure 2, corresponding to the analysis method 
according to the invention as described with reference 
to figure 1. 

35 The transformation function supplied by the module 

104 and the prediction function supplied by the module 
106 are advantageously stored with a view to subsequent 
use . 
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The voice conversion system receives at its input 
a voice signal 110 corresponding to a speech signal 
pronounced by the source speaker and intended to be 
converted. 

5 The signal 110 is introduced into a signal 

analysis module 112, implementing, for example, an HNM 
breakdown and enabling dissociation of the spectral 
information of the signal 110 in the form of cepstral 
coefficients and fundamental frequency information. The 
10 module 112 also supplies maximum frequency and phase 
voicing information obtained by applying the HNM model . 

The module 112 therefore implements the step 65 of 
the method previously described. 

This analysis may possibly be carried out in 
15 advance, and the information is stored for subsequent 
use . 

The cepstral coefficients supplied by the module 
112 are then introduced into a transformation module 
114 adapted to apply the transformation function 
20 determined by the module 104. 

Thus, the transformation module 114 implements 
step 70 of the method described with reference to 
figure 2 and supplies the transformed cepstral 
coefficients whose characteristics are similar to the 
25 spectral characteristics of the target speaker. 

The module 114 thus implements a modification of 
the spectral envelope of the voice signal 110. 

The transformed cepstral coefficients supplied by 
the module 114 are then introduced into a fundamental 
30 frequency prediction module 116 adapted to implement 
the prediction function determined by the module 106. 

Thus, the module 116 implements step 80 of the 
method described with reference to figure 2 and 
supplies at its output fundamental frequency 
35 information predicted exclusively on the basis of the 
transformed spectral information. 

The system then comprises a synthesis module 118 
receiving at its input the transformed cepstral 
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coefficients originating from the module 114 and 
corresponding to the spectral envelope, the predicted 
fundamental frequency information originating from the 
module 116, and the maximum frequency and phase voicing 
5 information supplied by the module 112. 

The module 118 thus implements step 90 of the 
method described with reference to figure 2 and 
supplies a signal 120 corresponding to the voice signal 
110 of the source speaker, except that its spectral and 
10 fundamental frequency characteristics have been 
modified in order to be similar to those of the target 
speaker . 

The system described may be implemented in various 
ways, in particular with the aid of a suitable computer 
15 program connected to sound acquisition hardware means. 

Embodiments other than the embodiment described 
may of course be envisaged. 

In particular, the HNM and GMM models may be replaced 
by other techniques and models known to the person 
20 skilled in the art, such as, for example, LSF (Line 
Spectral Frequencies) and LPC (Linear Predictive 
Coding) techniques, or f ormant-related parameters. 



